Provable and practical approximations for the degree distribution using sublinear graph samples
نویسندگان
چکیده
e degree distribution is one of the most fundamental properties used in the analysis of massive graphs. ere is a large literature on graph sampling, where the goal is to estimate properties (especially the degree distribution) of a large graph through a small, random sample. e degree distribution estimation poses a signicant challenge, due to its heavy-tailed nature and the large variance in degrees. We design a new algorithm, SADDLES, for this problem, using recent mathematical techniques from the eld of sublinear algorithms. e SADDLES algorithm gives provably accurate outputs for all values of the degree distribution. For the analysis, we dene two fatness measures of the degree distribution, called the h-index and the z-index. We prove that SADDLES is sublinear in the graph size when these indices are large. A corollary of this result is a provably sublinear algorithm for any degree distribution bounded below by a power law. We deploy our new algorithm on a variety of real datasets and demonstrate its excellent empirical behavior. In all instances, we get extremely accurate approximations for all values in the degree distribution by observing at most 1% of the vertices. is is a major improvement over the state-of-the-art sampling algorithms, which typically sample more than 10% of the vertices to give comparable results. We also observe that the h and z-indices of real graphs are large, validating our theoretical analysis. ACM Reference format: Talya Eden, Shweta Jain, Ali Pinar, Dana Ron, and C. Seshadhri. 2016. Provable and practical approximations for the degree distribution using sublinear graph samples. In Proceedings of , , , 12 pages.
منابع مشابه
Sampling from social networks’s graph based on topological properties and bee colony algorithm
In recent years, the sampling problem in massive graphs of social networks has attracted much attention for fast analyzing a small and good sample instead of a huge network. Many algorithms have been proposed for sampling of social network’ graph. The purpose of these algorithms is to create a sample that is approximately similar to the original network’s graph in terms of properties such as de...
متن کاملSublinear-Time Algorithms for Monomer-Dimer Systems on Bounded Degree Graphs
For a graph G, let Z(G, λ) be the partition function of the monomer-dimer system defined by ∑ k mk(G)λ , where mk(G) is the number of matchings of size k in G. We consider graphs of bounded degree and develop a sublinear-time algorithm for estimating logZ(G, λ) at an arbitrary value λ > 0 within additive error ǫn with high probability. The query complexity of our algorithm does not depend on th...
متن کاملExistence and Iterative Approximations of Solution for Generalized Yosida Approximation Operator
In this paper, we introduce and study a generalized Yosida approximation operator associated to H(·, ·)-co-accretive operator and discuss some of its properties. Using the concept of graph convergence and resolvent operator, we establish the convergence for generalized Yosida approximation operator. Also, we show an equivalence between graph convergence for H(·, ·)-co-accretive operator and gen...
متن کاملFloating-Point LLL: Theoretical and Practical Aspects
The text-book LLL algorithm can be sped up considerably by replacing the underlying rational arithmetic used for the Gram-Schmidt orthogonalisation by floating-point approximations. We review how this modification has been and is currently implemented, both in theory and in practice. Using floating-point approximations seems to be natural for LLL even from the theoretical point of view: it is t...
متن کاملExact Shortest Path Queries for Planar Graphs Using Linear Space
We provide the first linear-space data structure with provable sublinear query time for exact point-topoint shortest path queries in planar graphs. We prove that for any planar graph G with non-negative arc lengths and for any > 0 there is a data structure that supports exact shortest path and distance queries in G with the following properties: the data structure can be created in time O(n lg(...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1710.08607 شماره
صفحات -
تاریخ انتشار 2017